Haplotype assembly from aligned weighted SNP fragments

نویسندگان

  • Yu-Ying Zhao
  • Ling-Yun Wu
  • Ji-Hong Zhang
  • Rui-Sheng Wang
  • Xiang-Sun Zhang
چکیده

Given an assembled genome of a diploid organism the haplotype assembly problem can be formulated as retrieval of a pair of haplotypes from a set of aligned weighted SNP fragments. Known computational formulations (models) of this problem are minimum letter flips (MLF) and the weighted minimum letter flips (WMLF; Greenberg et al. (INFORMS J. Comput. 2004, 14, 211-213)). In this paper we show that the general WMLF model is NP-hard even for the gapless case. However the algorithmic solutions for selected variants of WMFL can exist and we propose a heuristic algorithm based on a dynamic clustering technique. We also introduce a new formulation of the haplotype assembly problem that we call COMPLETE WMLF (CWMLF). This model and algorithms for its implementation take into account a simultaneous presence of multiple kinds of data errors. Extensive computational experiments indicate that the algorithmic implementations of the CWMLF model achieve higher accuracy of haplotype reconstruction than the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors

MOTIVATION In genetic studies of complex diseases, haplotypes provide more information than genotypes. However, haplotyping is much more difficult than genotyping using biological techniques. Therefore effective computational techniques have been in demand. The individual haplotyping problem is the computational problem of inducing a pair of haplotypes from an individual's aligned SNP fragments...

متن کامل

A Markov chain model for haplotype assembly from SNP fragments.

Single nucleotide polymorphism (SNP) is the most frequent form of human genetic variations and of importance for medical diagnosis and tracking disease genes. A haplotype is a sequence of SNPs from a single copy of a chromosome, and haplotype assembly from SNP fragments is based on DNA fragments with SNPs and the methodology of shotgun sequence assembly. In contrast to conventional combinatoria...

متن کامل

Statistical Analysis of SNP Alignment From Homologous Chromosomes

In some sequencing projects, the targets contains two homologous chromosomes and our aim is to distinguish between polymorphisms and sequencing errors. Duplication is an important aspect of evolution, and paralogous regions could be mis-aligned together in one assembly. In this case our goal is to unravel duplicated regions. The difficulty lies in the fact that origins of fragments are unknown....

متن کامل

Haplotype reconstruction from SNP fragments by minimum error correction

MOTIVATION Haplotype reconstruction based on aligned single nucleotide polymorphism (SNP) fragments is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. An important computational model of this problem is the minimum error correction (MEC) model, which has been mentioned in several literatures. The model retrieves a pair of haplotype...

متن کامل

HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational biology and chemistry

دوره 29 4  شماره 

صفحات  -

تاریخ انتشار 2005